Search CORE

151 research outputs found

Entity-Oriented Search

Author: Balog Krisztian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/02/2021
Field of study

This open access book covers all facets of entity-oriented search—where “search” can be interpreted in the broadest sense of information access—from a unified point of view, and provides a coherent and comprehensive overview of the state of the art. It represents the first synthesis of research in this broad and rapidly developing area. Selected topics are discussed in-depth, the goal being to establish fundamental techniques and methods as a basis for future research and development. Additional topics are treated at a survey level only, containing numerous pointers to the relevant literature. A roadmap for future research, based on open issues and challenges identified along the way, rounds out the book. The book is divided into three main parts, sandwiched between introductory and concluding chapters. The first two chapters introduce readers to the basic concepts, provide an overview of entity-oriented search tasks, and present the various types and sources of data that will be used throughout the book. Part I deals with the core task of entity ranking: given a textual query, possibly enriched with additional elements or structural hints, return a ranked list of entities. This core task is examined in a number of different variants, using both structured and unstructured data collections, and numerous query formulations. In turn, Part II is devoted to the role of entities in bridging unstructured and structured data. Part III explores how entities can enable search engines to understand the concepts, meaning, and intent behind the query that the user enters into the search box, and how they can provide rich and focused responses (as opposed to merely a list of documents)—a process known as semantic search. The final chapter concludes the book by discussing the limitations of current approaches, and suggesting directions for future research. Researchers and graduate students are the primary target audience of this book. A general background in information retrieval is sufficient to follow the material, including an understanding of basic probability and statistics concepts as well as a basic knowledge of machine learning concepts and supervised learning algorithms

Directory of Open Access Books (DOAB)

Conversational AI from an Information Retrieval Perspective: Remaining Challenges and a Case for User Simulation

Author: Balog Krisztian
Publication venue: CEUR
Publication date: 01/01/2021
Field of study

Conversational AI is an emerging field of computer science that engages multiple research communities, from information retrieval to natural language processing to dialogue systems. Within this vast space, we focus on conversational informa tion access, a problem that is uniquely suited to be addressed by the information retrieval community. We argue that despite the significant research activity in this area, progress is mostly limited to component-level improvements. There remains a disconnect between current efforts and truly conversational information access systems. Apart from the inherently chal lenging nature of the problem, the lack of progress, in large part, can be attributed to the shortage of appropriate evaluation methodology and resources. This paper highlights challenges that render both offline and online evaluation methodologies unsuitable for this problem, and discusses the use of user simulation as a viable solution.publishedVersio

NORA - Norwegian Open Research Archives

UiS Brage

Towards Building a Knowledge Base of Monetary Transactions from a News Collection

Author: Balog Krisztian
Benetka Jan R.
Nørvåg Kjetil
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/09/2017
Field of study

We address the problem of extracting structured representations of economic events from a large corpus of news articles, using a combination of natural language processing and machine learning techniques. The developed techniques allow for semi-automatic population of a financial knowledge base, which, in turn, may be used to support a range of data mining and exploration tasks. The key challenge we face in this domain is that the same event is often reported multiple times, with varying correctness of details. We address this challenge by first collecting all information pertinent to a given event from the entire corpus, then considering all possible representations of the event, and finally, using a supervised learning method, to rank these representations by the associated confidence scores. A main innovative element of our approach is that it jointly extracts and stores all attributes of the event as a single representation (quintuple). Using a purpose-built test set we demonstrate that our supervised learning approach can achieve 25% improvement in F1-score over baseline methods that consider the earliest, the latest or the most frequent reporting of the event.Comment: Proceedings of the 17th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '17), 201

arXiv.org e-Print Archive

Crossref

Ad Hoc Table Retrieval using Semantic Similarity

Author: Balog Krisztian
Zhang Shuo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

We introduce and address the problem of ad hoc table retrieval: answering a keyword query with a ranked list of tables. This task is not only interesting on its own account, but is also being used as a core component in many other table-based information access scenarios, such as table completion or table mining. The main novel contribution of this work is a method for performing semantic matching between queries and tables. Specifically, we (i) represent queries and tables in multiple semantic spaces (both discrete sparse and continuous dense vector representations) and (ii) introduce various similarity measures for matching those semantic representations. We consider all possible combinations of semantic representations and similarity measures and use these as features in a supervised learning model. Using a purpose-built test collection based on Wikipedia tables, we demonstrate significant and substantial improvements over a state-of-the-art baseline.Comment: The web conference 2018 (WWW'18

arXiv.org e-Print Archive

Crossref

UiS Brage

Semantic Answer Type Prediction using BERT: IAI at the ISWC SMART Task 2020

Author: Balog Krisztian
Setty Vinay
Publication venue: CEUR-WS
Publication date: 01/01/2020
Field of study

This paper summarizes our participation in the SMART Task of the ISWC 2020 Challenge. A particular question we are interested in answering is how well neural methods, and specifically transformer models, such as BERT, perform on the answer type prediction task compared to traditional approaches. Our main finding is that coarse-grained answer types can be identified effectively with standard text classification methods, with over 95% accuracy, and BERT can bring only marginal improvements. For fine-grained type detection, on the other hand, BERT clearly outperforms previous retrieval-based approaches.publishedVersio

UiS Brage

Towards Filling the Gap in Conversational Search: From Passage Retrieval to Conversational Response Generation

Author: Balog Krisztian
Łajewska Weronika
Publication venue
Publication date: 17/08/2023
Field of study

Research on conversational search has so far mostly focused on query rewriting and multi-stage passage retrieval. However, synthesizing the top retrieved passages into a complete, relevant, and concise response is still an open challenge. Having snippet-level annotations of relevant passages would enable both (1) the training of response generation models that are able to ground answers in actual statements and (2) the automatic evaluation of the generated responses in terms of completeness. In this paper, we address the problem of collecting high-quality snippet-level answer annotations for two of the TREC Conversational Assistance track datasets. To ensure quality, we first perform a preliminary annotation study, employing different task designs, crowdsourcing platforms, and workers with different qualifications. Based on the outcomes of this study, we refine our annotation protocol before proceeding with the full-scale data collection. Overall, we gather annotations for 1.8k question-paragraph pairs, each annotated by three independent crowd workers. The process of collecting data at this magnitude also led to multiple insights about the problem that can inform the design of future response-generation methods. This is an extended version of the article published with the same title in the Proceedings of CIKM'23.Comment: Extended version of the paper that appeared in the Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM '23

arXiv.org e-Print Archive

Target Type Identification for Entity-Bearing Queries

Author: Balog Krisztian
Croft W. Bruce
Mikolov Tomas
Sawant Uma
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/07/2017
Field of study

Identifying the target types of entity-bearing queries can help improve retrieval performance as well as the overall search experience. In this work, we address the problem of automatically detecting the target types of a query with respect to a type taxonomy. We propose a supervised learning approach with a rich variety of features. Using a purpose-built test collection, we show that our approach outperforms existing methods by a remarkable margin. This is an extended version of the article published with the same title in the Proceedings of SIGIR'17.Comment: Extended version of SIGIR'17 short paper, 5 page

arXiv.org e-Print Archive

Crossref